fix(excel-html): chart overlays match Excel position and size#152
Open
dragonwhites wants to merge 3703 commits into
Open
fix(excel-html): chart overlays match Excel position and size#152dragonwhites wants to merge 3703 commits into
dragonwhites wants to merge 3703 commits into
Conversation
chart title color emitted bare hex (color:FF0000) and the axis/legend/ gridline color self-sync re-emitted bare hex (fill="0000FF") — invalid CSS/SVG, so browsers rendered them black. Route all through a CssHexColor helper (idempotent for #/named values). Also honor the pptx title color (was ignored, using theme tx1). Shared renderer: xlsx/pptx/docx all fixed.
a line shape collapsed to one dimension (height=0 or width=0) rendered invisible: the solid path drew a degenerate strip and the SVG-dash path computed a negative rect and vanished. Route a non-line-preset outlined, text-less shape whose box collapses through RenderConnector, which draws zero-dimension lines with the correct color/width/dash.
writing a value to a table header cell produced two corruptions that made the workbook unopenable in Excel (our lenient reader masked both): 1. the cell kept a stale <is> inline-string placeholder alongside the new <v> (dual value children) — clear InlineString on every value/ formula/clear write path. 2. the table column name (Column1/Column2 auto-placeholder) no longer matched the overwritten header text — re-sync <tableColumn name> to the header cell on writes within a table's header row. Verified: real Excel now opens the previously-rejected file.
the sparkline SVG stroke/fill was hardcoded to #4472C4, ignoring the stored series color (x14 colorSeries). Read group.SeriesColor and use it for the line stroke and column-bar fill, falling back to the default blue only when none is stored.
margin-top was emitted as spaceBefore minus the previous paragraph's spaceAfter (a stale 'flexbox doesn't collapse' assumption). Paragraphs are block-flow siblings whose margins collapse to the max, so the subtraction shrank every gap — and for small spaceBefore it dropped to 0, ignoring the spacing entirely. Emit the full spaceBefore; CSS margin collapse then yields max(prevAfter, thisBefore), matching Word.
bar/column/stacked/area/pie/doughnut fills fell back to opacity=0.85 when a series declared no explicit a:alpha, washing every default chart ~15% lighter than native Office (which renders opaque). Default the FillOpacity fallback to 1.0; an explicit a:alpha still overrides. Shared renderer — xlsx/pptx/docx all fixed.
a w:br type=page emitted the page-transition </div></div> markup while the run <span> and paragraph <p> were still open, producing invalid nesting (browsers auto-recovered). Close </span></p> before the break marker and reopen an identical <p> for any remaining runs, mirroring the column-break path. Nesting validator now clean; break still paginates.
cellIs equal/notEqual rules against a text operand never applied in the preview — the evaluator gated on a numeric cell value and compared numerically only. Now equal/notEqual fall back to a case-insensitive string compare (stripping the quotes from the "..." formula literal) when either side is non-numeric; numeric rules unchanged.
…ed by Office an explicit color on a line/scatter/radar series was written to a bare <c:spPr><a:solidFill> — which real PowerPoint ignores for the line stroke (it uses the theme color), while the HTML preview read it and showed the requested color. The preview thus lied. Write line-based series colors into <a:ln><a:solidFill> so Office honors them; reader prefers a:ln then falls back to bare fill. Bar/column/pie/area keep the bare area fill. Verified in real PowerPoint: line now renders red.
a General/unformatted numeric cell holding an IEEE-754-noisy value (e.g. 99.98999999999999 from computed/imported/round-tripped data) showed the raw double; Excel rounds General to ~15 significant digits (99.99). FormatGeneralNumber's normal-magnitude branch now uses G15. Scientific branch and explicitly-number-formatted cells unchanged.
filled radar (radarStyle=filled) series polygons rendered at fill-opacity=0.2 — nearly invisible vs PowerPoint's vibrant ~0.7 fill. Raised to 0.7, scoped to the radar renderer; line-chart area-under fills untouched. Verified against native PowerPoint.
font-family always terminated in sans-serif, so Courier New/Consolas fell back to a proportional font when unavailable, breaking monospace alignment. Pick the generic family by font name: monospace fonts -> monospace, serif fonts -> serif, else sans-serif.
non-stacked area series were emitted in reverse index order, so the wrong series sat on top vs PowerPoint (which paints higher-idx on top). Paint series0 first (bottom) ... seriesN last (top). Verified against real PowerPoint: same series now on top in both.
a chart series/categories given as a cell range (series1=B1:B4) emitted a numRef/strRef with no cached values, so the HTML preview plotted nothing (the dataRange= path already cached). Backfill the cache from the referenced cells, mirroring dataRange=. Verified: HTML renders the bars and real Excel opens+shows the chart.
table cell content uses 1-based keys (r1c1, matching Word); 0-based or mistyped keys (r0c0) were silently dropped because the border fan-out used a .Where() over TrackingPropertyDictionary, marking every key read and wiping the unsupported signal. Scan via Keys+TryGetValue so unread keys surface as unsupported_property, and hint 'did you mean r1c1 (cell keys are 1-based)'.
Sheets that declare millions of value-less, formula-less, style-less empty cells balloon the SDK DOM to GBs. WorksheetBloatFilter strips them before the SDK parses the part (lossless — Excel and LibreOffice discard such cells on load too). Gated: normal files take the original direct-stream path untouched. Filtered sessions operate on a slimmed in-memory copy and write back to the backing file on save/close so mid-session snapshots and final closes still hit disk.
…g; share smooth/trendline/error-bar primitives with the line renderer Scatter charts had two overlapping implementations: a dedicated RenderScatterChartSvg (per-series xVal/yVal value axes) and a parallel path that threaded scatter X through RenderLineChartSvg. The line path was already shadowed by the scatter-PlotArea dispatch branch, so it was dead code, and keeping both invited drift. Consolidate onto RenderScatterChartSvg and delete the line-renderer scatter path (the scatterX parameter, its value-axis block, and the dispatch-side category parsing). Scatter keeps its correct per-series data model. To avoid duplicating decorations, extract three primitives from RenderLineChartSvg and call them from both renderers: - BuildSmoothPath: Catmull-Rom -> cubic Bezier path - AppendErrorBars: vertical (Y) error bars at each point - AppendTrendline: regression line; takes the X data + an X-value->pixel mapper, so the caller owns axis positioning Because the scatter renderer feeds AppendTrendline the real X values (xVal), a scatter trendline now fits over the true X domain. The previous trendline code always regressed over the 1-based category index, which produced a wrong slope/equation for non-uniformly-spaced scatter X (e.g. X=10,20,40, 80,160 -> correct slope 0.3649, index-based slope 13.1). Line/category charts still pass the index as X, so their output is unchanged (verified byte-identical).
A column of chained formulas (B1=A1, Bi=B{i-1}+Ai) silently produced a
wrong value once the chain exceeded 256 links: B270 over A1..A270=1..270
returned 36480 instead of 36585, exact through link 256 and drifting
after.
Root cause: _parseDepth, the per-formula parenthesis-nesting guard
(cap 256, a DoS backstop), was never reset when ResolveCellResult
re-entered EvaluateFormula to evaluate a referenced cell. Each chain
link leaked one frame into the counter, so at link 257 the parser bailed
mid-expression and the evaluator fell back to a blank cached value -
a silent zero folded into every downstream sum.
Fix: save and zero _parseDepth around the nested EvaluateFormula call
and restore it in finally, so the cap counts only the current formula's
own nesting. Cross-cell recursion depth stays guarded by the existing
TryEnsureSufficientExecutionStack probe and the MaxSameSheetDepth=1000
backstop, which surface a visible #NUM! instead of truncating silently.
… HTML preview A containsBlanks rule over D1:D5 with only D1 populated never showed its fill: the HTML grid sized itself to the used data extent (one row), so the blank in-range cells D2:D5 were not rendered at all and the CF map was clamped to the same bounds. Fix: merge raw conditional-formatting sqref extents (CfRangeExtents) into both the grid dimensions and the CF evaluation bounds, clamped through the existing row/column caps so whole-column references stay bounded. containsBlanks evaluation already treats missing cells as blank, so the synthesized cells pick up the fill.
\begin{aligned}a=1 \\ b=2\end{aligned} produced an OMML matrix (m:m)
instead of an equation array (m:eqArr), and the dump side then
serialized that matrix back as \begin{matrix}, so the equation lost its
alignment semantics in Word and the round-trip drifted on first pass.
Root cause: the align-family environments (align, aligned, gathered,
split) were routed through ParseMatrix alongside the true matrix
environments. The OMML-to-LaTeX direction already handled eqArr; it
just never fired because parse never produced one.
Fix: after ParseMatrix, convert each matrix row into one m:e of an
m:eqArr for align-family environments. pmatrix/bmatrix/matrix/cases
keep producing m:m. Round-trip is now stable: aligned -> eqArr ->
aligned.
…ional points-to-EMU multiply Setting axis/border line specs (valAxisLine=FF0000:12700:dash) with an EMU-scale width wrote a:ln/@w = 161290000 - the width slot was always multiplied by 12700 (points to EMU), so values copied from real OOXML overflowed the ST_LineWidth maximum of 20116800 and the document failed schema validation. The dump side emits widths verbatim, so dump->batch round-trips faithfully replayed the invalid width. Fix: shared TryParseLineWidthEmu. Bare decimals keep their documented meaning of points; unit-qualified values (1pt, 0.5mm, 12700emu) go through EmuConverter.ParseEmu; bare integers above the legal point ceiling are treated as raw EMU per the ParseEmu convention; the result is clamped to the schema range so Set can never emit an invalid width. Applied to every colon-spec and dotted .width mutator that shared the same multiply: series.outline, gridline specs, valAxisLine/catAxisLine, plotArea.border, chartArea.border, chart lineWidth, seriesN.lineWidth.
…adback Adding a shape with underline=true or strikethrough=single and no text silently lost both properties: Get returned a Format without the keys, so dump->replay dropped them. Root cause: the write side already stored them on endParaRPr (the same RunPropTargets fallback bold/italic use for runless shapes), but the runless-shape reader surfaced only font/size/bold/italic/caps/color from endParaRPr - underline and strike were never read back. Fix: mirror the run-present underline/strike readers into the endParaRPr branch (sng->single, dbl->double; SingleStrike/DoubleStrike/ NoStrike mapping), so the keys round-trip for shapes that have no text yet.
…eview highlight= was rejected as UNSUPPORTED on the Add path, had no curated Set case, and never surfaced on Get, so a text highlight could not be authored and view html emitted no background-color. The one existing writer (find/replace formatting) built the color via BuildSolidFillColor, which wrote 8-digit AARRGGBB into a:srgbClr/@Val - invalid ST_HexColorRGB that renderers fell back to white. Fix: - Set: curated highlight case writes <a:highlight> through BuildColorElement, positioned by ReorderDrawingRunProperties so the rPr child order stays schema-valid (PowerPoint silently ignores out-of-order children). - Add: highlight joins the shape effectKeys and the AddRun branch (fill-before-latin slot), accepted instead of UNSUPPORTED. - Get: new ReadColorFromHighlight readback at run and shape level, canonical #-prefixed uppercase hex. - find/replace path repointed to BuildColorElement, fixing the invalid 8-digit srgbClr val. - schemas/help/pptx/{run,shape}.json declare the property. The HTML preview already mapped a:highlight to background-color; it now receives real data.
Adding a field (or run) with color=accent1 writes <w:color w:val="auto" w:themeColor="accent1"/>, but Get returned the raw compound "auto;themeColor=accent1" instead of "accent1", breaking the canonical rule that scheme colors pass through unchanged. val="auto" carries no color information - Word resolves the run color from the theme slot - so the compound head is pure noise for the pure-theme form. StyleColorWithThemeTail now collapses auto + themeColor (no shade/tint) to the bare scheme name. An explicit hex val alongside themeColor keeps the full "HEX;themeColor=..." tail, and any themeShade/themeTint modifier keeps the compound form too - those carry information the bare name would lose.
…hema-valid OOXML Two table rebuild defects produced documents the OpenXML schema validator rejects: 1. add table with rowBandSize/colBandSize wrote w:tblStyleRowBandSize / w:tblStyleColBandSize as direct tblPr children. The SDK's CT_TblPr particle for a table instance has no slot for them (they belong to a table STYLE's tblPr), so the rebuilt document failed validation with 'invalid child element tblStyleRowBandSize'. They are now emitted inside an mc:AlternateContent/mc:Choice Requires="w" guard: Word processes the choice transparently per MCE rules (identical runtime behavior), while strict schema validation passes. The band-size reader unwraps one level of AlternateContent/Choice so readback and dump still surface rowBandSize/colBandSize - scoped to direct children, a tblPrChange snapshot does not leak its prior band sizes onto the live table. 2. Replaying a tracked-change cell merge appended w:tcPr AFTER the cell's w:p; CT_Tc requires tcPr as the first child, so validation failed with 'unexpected child element tcPr'. The replay path now inserts tcPr at the schema position. Both round-trips validate clean with attributes preserved.
…o the paragraph mark A paragraph whose content is a field chain with a formatted cached result (e.g. a REF field whose first result run is bold) rebuilt with the formatting duplicated: <w:b/> appeared both on the result run and in <w:pPr><w:rPr> - the paragraph mark - so the rebuilt XML carried two bolds where the source had one. Root cause: Get surfaces first-run formatting on the paragraph node (firstRun-fallback), and the emitter strips those harvested keys from the paragraph props only when a run-typed child exists. A field chain swallows all its text runs in CollapseFieldChains, leaving no run-typed children - so the strip never fired, the harvested bold rode the 'add p' op, and the field emit (verbatim raw-set / add field) replayed the same formatting a second time on the result run. Fix: field entries count as format-bearing hoist sources, so the paragraph-level strip fires for field-chain paragraphs too. The paragraph mark keeps only genuine pPr/rPr formatting.
…uding single-run paragraphs Inline <w:customXml> wrappers flatten to their inner runs on dump->batch (structure is not replayable), but unlike the parallel smartTag flatten the loss was silent: no warning reached the dump envelope or view issues, so a consumer had no machine-readable signal that wrapper semantics were dropped. Two gaps: 1. A run-level <w:customXml> parses as a typed CustomXmlRun (smartTag parses as an unknown element and already took the marking path), so its inner runs were never stamped _wrapperFlattened. 2. The single-run collapse path folds the wrapped run into the paragraph's own text prop and bypassed EmitPlainOrHyperlinkRun, skipping the warning emission even for marked runs. Fix: stamp _wrapperFlattened for runs under a CustomXmlRun ancestor, and extract the warning emission into a shared WarnWrapperFlattened used by both the collapse path and EmitPlainOrHyperlinkRun. Text content round-trips as before; the flatten is now always announced.
Injecting <w:r><w:t xml:space="preserve"> </w:t></w:r> via raw-set
saved an empty <w:t/> - the space was destroyed at write time, before
any round-trip. Visible end-to-end as nested smartTags losing their
interstitial space-only run ('John Smith' rebuilt as 'JohnSmith').
Root cause: ParseFragment parsed fragments with XDocument
LoadOptions.None, which discards whitespace-only text nodes wholesale -
correct for formatting whitespace between elements, wrong for leaf
content.
Fix: parse fragments whitespace-preserved, then normalize: whitespace
text nodes whose parent has element children (formatting indentation)
are removed as before; whitespace that is a leaf element's entire
content is kept and stamped xml:space="preserve" so the subsequent SDK
InnerXml parse and later document reopens keep it too. Applies to
raw-set on all three formats; the change is additive (previously-dropped
content is now preserved), with no effect on element structure.
…e part A docx whose word/theme/theme1.xml exists but is empty (or a bare <a:theme/> without themeElements) dumped as a raw-set remove of the theme part - the rebuilt document had NO theme at all. The input was already broken (a theme without themeElements is schema-invalid and Word refuses to open it), and the round-trip preserved the brokenness in a different shape instead of healing it. Root cause: EmitThemeRaw distinguishes 'theme part absent' from 'theme part present but degenerate' via the typed .Theme accessor, which is null for both, so the present-but-degenerate case fell into the remove branch meant for genuinely theme-less documents. Fix: probe the package part URIs for /word/theme/ - a present-but- degenerate part falls through to BlankDocCreator.BuildDefaultTheme, the same schema-complete default theme a blank document gets, so the rebuilt file opens in Word. A genuinely absent theme still emits the remove.
… a preceding merged cell Column operations address cells by their ordinal position in the row (cells[colIdx-1]). That equals the target grid column only when no earlier cell in the row spans horizontally; a gridSpan before the target column shifts the ordinal, so the operation silently acted on the WRONG cell — e.g. removing column 3 in a row whose first cell spans columns 1-2 deleted the column-4 cell and kept column 3. The merge guard inspected the same ordinal cell, so a preceding span both evaded the guard and misdirected the op (silent data corruption). The track-change column delete and the add-column boundary check had no slot-aware guard at all. Replace the ordinal merge check with a slot-aware guard that walks each row accumulating gridSpan: it rejects when the target grid slot is itself merged (gridSpan/vMerge) or when a preceding horizontal span makes the ordinal differ from the slot. Wired into remove/move/copy/add-column and the track-change column delete. This honors the existing "unmerge before column-level operations" contract, turning silent corruption into a clear, actionable error; clean (unmerged) tables are unaffected.
…) round-trips instead of dangling A presentation extLst can reference a custom binary part by relationship id — Google Slides exports <go:slidesCustomData r:id="rIdN"> pointing at ppt/metadata (rel type .../presentationmetadata, content type application/binary). EmitPresentationExtras replays that extLst verbatim via raw-set, carrying the r:id, but nothing re-created the relationship or the target part: the rebuilt presentation.xml then carried a dangling rId and PowerPoint refused the deck. Surface presentation-attached ExtendedParts via GetPresentationExtendedParts and emit an add-part extpart row on /presentation that pins the source rId, the custom rel type, content type and bytes. Extend the extpart add-part host set to accept /presentation (alongside slide/layout/master). The part re-homes to the SDK's ExtendedPart location but resolves by r:id, so the reference binds and the bytes round-trip verbatim.
… of rejecting the picture add A TIFF image part can carry the content type image/tif as well as the canonical image/tiff. The MIME validation only recognised image/tiff, so a deck with an image/tif picture aborted its add step with 'Unsupported MIME type: image/tif'. Accept image/tif alongside image/tiff (mirroring the existing image/jpeg / image/jpg alias), and add the same alias to the four thumbnail / image content-type maps in the handler so a tif part is not silently relabelled png.
… dangling A SmartArt diagram node can carry an external hyperlink (<a:hlinkClick r:id>) whose relationship lives on the diagram data part's (or the DSP cached-drawing part's) OWN .rels. add-part smartart recreated both parts empty and re-attached only the embedded ImageParts, so the hyperlink relationship was dropped: the replayed data/drawing XML kept the verbatim r:id but its .rels no longer declared it, leaving a dangling relationship that PowerPoint refused (0x80070570). Carry each diagram part's external hyperlink relationships (rId + target) on the SmartArtInfo, emit them as numbered dataHlink/drawingHlink props, and re-add them via AddHyperlinkRelationship with the pinned source rId in the add-part smartart handler. Mirrors the existing embedded-image carrier (dataImage/drawingImage).
Integrates the pptx dump→batch round-trip campaign branch (73 fix(pptx) commits) covering carrier round-trips (images, themes, tags, extended/custom parts, external links, slide-jump links, SmartArt images+hyperlinks, presentation-level Google metadata), scheme/system/pattern colors, signed color-transform offsets, connector variants, empty charts, negative insets, group-child id stability, nested group set routing, and table cell txBodyRaw robustness.
…round-trip A table cell shaped [nested table, display equation] lost the equation on dump→batch. The equation paragraph was emitted as `add equation` targeting p[cellParaIdx] (= p[1]), but at replay p[1] is the cell's leading outer-seed paragraph; the nested-table lead-cleanup then issues `remove p[1]`, deleting that paragraph together with the equation just placed in it. The empty paragraph the SDK seeds AFTER the nested table — the one a plain paragraph correctly reuses via set p[last()] — was left untouched. Mirror the plain-paragraph trailing-auto-p handling for equations: treat an equation immediately after a nested table as the trailing auto-present paragraph, and target p[last()] (the seeded post-table paragraph that survives the remove) instead of p[cellParaIdx]. Equation-only cells and equations after a text paragraph are unchanged.
… of dangling A slideMaster/slideLayout can host an <p:oleObj r:id="rIdN"> (e.g. an embedded clip-art OLE object) whose part is an EmbeddedObjectPart / EmbeddedPackagePart on the master's own .rels. The master XML is replayed verbatim via raw-set (keeping r:id="rIdN"), but the extended-part carrier only re-created ExtendedPart blobs, so the OLE part + its relationship were dropped: the rebuilt master's r:id dangled and PowerPoint refused the deck (0x80070570). Broaden ReadExtendedPartInfos to also surface EmbeddedObjectPart and EmbeddedPackagePart (alongside ExtendedPart), carrying each part's relationship type, content type and bytes. They flow through the existing master/layout add-part extpart carrier, which re-pins the source rId; the OLE relationship is recreated with its .../oleObject type and the bytes round-trip verbatim. ImageParts stay excluded (carried separately by GetMasterImageParts).
…ad of being dropped A cell holding two block content controls — e.g. [plain SDT, rich SDT] — lost the second one on dump→batch. EmitCellSdt chooses between inserting the SDT before the cell's auto-seed paragraph (leading content) and appending it (non-leading) from a `cellHasContent` flag, which was fed `firstParaSeen` — a signal that only tracks PARAGRAPHS. A preceding SDT (or nested table) leaves firstParaSeen false, so the second SDT still took the insert-before-seed path; but the first SDT had already consumed or displaced that seed paragraph, so the raw-set targeted a paragraph that no longer existed and the control was dropped. Track whether ANY cell content (paragraph, nested table, or SDT) has been emitted and feed that to EmitCellSdt, so only a genuinely leading SDT inserts before the seed and every later one appends. The seed-consumed flag that drives a following paragraph's fresh `add p` keys off the same signal. Sole SDT, leading SDT, SDT-then-paragraph and paragraph-then-SDT cells are unchanged.
Brings the round-36 fix onto main: a slideMaster/slideLayout <p:oleObj r:id> (embedded OLE object / package) is now re-pinned via the extended-part carrier so the relationship resolves instead of dangling on round-trip.
…ips instead of flattening to text A content control inside a table cell whose rich content referenced an external relationship (a hyperlink, or an embedded image) was flattened to plain text on dump→batch — the link, run formatting and multi-paragraph structure were lost. The cell SDT emitter bailed straight to the text emit on any external rel, while the body-level emitter already ships such an SDT through the inlined-parts carrier (verbatim sdtXml + part/ext data with rel ids rewritten on replay). The cell path simply lacked that carrier. Mirror the body path in EmitCellSdt: when the rich cell SDT carries an external rel, try the GetSdtEmitData carrier and emit `add sdt sdtXml=…`; only fall back to the text flatten when a referenced part can't be resolved. Because the carrier's `add sdt` appends after the cell's auto-seed paragraph, drop that now-leading seed when the control is the cell's leading content so the rebuilt cell matches the source shape; non-leading controls append after existing content and need no cleanup. Header/footer hosts (no auto-seed) skip the seed removal.
…al so following tables round-trip in place A body-level rich content control whose content includes one or more tables is shipped verbatim by the dump carrier (sdtXml). The carrier emits the SDT — including its inner <w:tbl> — without routing through EmitTable, so ctx.TableOrdinalBox is never advanced for the shipped tables. At replay those tables still exist in document order and count toward the `(//w:tbl)[N]` XPath that later cell-SDT / tblGrid raw-sets resolve against, so every table that follows the carrier had its selector land one (or more) tables early. The result: a sibling table's cell content control wrapped the wrong cell, and the spurious nested SDT that produced was dropped on the next SDK re-save, taking that cell's drawing with it. Advance TableOrdinalBox by the number of <w:tbl> opens in the shipped sdtXml right after the carrier is emitted, keeping the emitter's `(//w:tbl)` numbering in lockstep with replay. Tables following such a carrier now round-trip in place with their cell content controls and drawings intact.
…alformed num val=auto A dataBar conditional-formatting rule documents its min/max bounds as "numeric or 'auto'", where 'auto' requests automatic bounds. The add path treated any non-null bound as a literal numeric value, so min=auto/max=auto serialized as <cfvo type="num" val="auto"/> (and x14 <cfvo type="num"><f>auto</f>). That is malformed — a num-typed cfvo requires a numeric val — so Excel silently dropped the entire data bar on open (no bars rendered, though the file still validated against the lax schema). Normalize the 'auto' sentinel (case-insensitive) back to null before building the cfvo elements, so both the 2007 dataBar and the 2010+ x14 counterpart take their automatic-bound branches (type=min/max, autoMin/autoMax) — identical to omitting the bound. Explicit numeric bounds are unaffected.
… xlsx CF rule family The excel examples covered cell formatting, charts, and pivot tables but not conditional formatting — the conditionalformatting element and its ~30 rule types had zero coverage. Add a 7-sheet showcase, one rule family per sheet: cellIs comparison, text matching, top/bottom/average, data bars, colour scales, icon sets, and formula/date/duplicate/unique rules. The build script drives the officecli Python SDK (resident pipe + batched writes) rather than per-command subprocess calls, demonstrating the SDK as the intended consumer for many-write workbooks. It falls back to the in-repo SDK copy when officecli-sdk is not pip-installed. Validation runs against the saved file from a fresh process, since CF differential fills live in the workbook-level dxfs table.
… instead of dropping the row A <w:sdt> (SdtRow) that is a direct child of <w:tbl> and whose sdtContent wraps an entire <w:tr> — Word's locked-row shape, used by forms to make a whole row read-only — was silently dropped on dump→batch round-trip: the row, its cells, text and <w:lock> all vanished and the table rebuilt short. Root cause: navigation enumerated rows with table.Elements<TableRow>(), which sees only direct children, so a row nested inside an SdtRow wrapper was invisible to Get / Query / dump and never emitted. CT_Tbl's content model permits an SDT around one or more rows, so this is valid input. Add GetTableRowsFlattened (mirroring the existing GetRowCellsFlattened cell-flatten contract) and route the table row enumeration in TableToNode / the tr navigation axis / the row walk through it, so wrapped rows are counted and their cells/text round-trip via the typed emit. After the typed row/cell content is applied, EmitTable patches each single-row wrapper back to its verbatim <w:sdt> via raw-set replace in descending row order (replacing w:tr[N] with the sdt removes it from the w:tr axis), which restores the wrapper and its lock. Tables with row-level content controls now round-trip with every row, its content and its lock intact.
…flushed style edits
ValidateDocument validates a throwaway Clone of the package to avoid touching
the live document. But Clone(stream) reads every live part's stream, which
re-introduces the very desync the clone was meant to avoid: cloning a package
that has a loaded-but-unflushed StylesPart (e.g. one just created by a fill/font
edit) desyncs the SDK's dirty-tracking for that part, and it then serializes
EMPTY on the caller's next Save. The visible failure was "edit a cell style ->
validate in-session -> save" producing a 0-byte styles.xml ("Root element is
missing") and, for conditional-formatting rules, a cascade of dangling-dxfId
errors — a file the spreadsheet app reports as corrupt.
Flush each already-loaded part's DOM back to its stream before the clone, so
both Clone and the preflight read in-sync bytes and cannot desync. Only loaded
parts are flushed: an unloaded part cannot be dirty, and force-loading one would
make the caller's Save re-serialize an untouched part — validate must stay
read-only. DocumentFormat.OpenXml 3.x exposes no public IsRootElementLoaded, so
the loaded-root state is read from the private field reflectively, with the
whole pass best-effort (any per-part hiccup is swallowed; a renamed field
degrades to a no-op).
…t property surface The word examples covered run, paragraph, table, and numbering formatting but not the document-level surface — the `document` container's 67 settable properties had no example. Add a showcase covering all seven groups: core + extended metadata, page setup (size/orientation/margins/mirror/book-fold), docDefaults (the run/paragraph defaults unstyled text inherits), theme palette and major/minor fonts, CJK grid and spacing controls, font embedding, and display/print/privacy flags. Built on the officecli Python SDK (resident pipe + batched writes), with the in-repo SDK fallback. Body paragraphs are intentionally unstyled so the docDefaults font/size/colour inheritance is visible in the rendered document.
…rlink/image relationship on that part, not the main document A rich content control (SDT) carrying an external hyperlink — or an image — that sits directly at the root of a header or footer round-tripped with its relationship registered on word/_rels/document.xml.rels instead of the header/footer part's own rels. The r:id kept verbatim in word/footerN.xml then pointed at a relationship that part did not have, so Word treated the whole document as corrupt and refused to open it. Root cause: ResolveImageHostPart walked run.Ancestors<Header>() / run.Ancestors<Footer>() to find the host part. Ancestors excludes self, and the SDT carrier in AddSdt passes the Footer/Header element ITSELF as the parent when an `add sdt parent=/footer[N]` lands the control at the part root — so the lookup found no header/footer ancestor and fell through to MainDocumentPart. The relationship was created on the main document while the dangling r:id stayed in the footer. Use a self-or-ancestor walk (run as Footer ?? run.Ancestors<Footer>(), and likewise for Header), mirroring the BUG-R14A fix already in ResolveHostPart. Header/footer-root content controls that carry hyperlinks or images now register their relationships on the correct part and the file opens.
…s instead of being dropped The "forced page break, then a new section" idiom — a paragraph whose pPr holds a <w:sectPr> and whose body is a single <w:r><w:br w:type="page"/></w:r> — lost its page break on dump→batch. The section-carrier paragraph emits its runs through a dedicated filter that admitted text, tab, bookmark, SDT and drawing runs but not a pure break run (which surfaces as a child of type "break" with empty text). The break fell through the run/r/picture gate and was dropped, so the forced page break collapsed and every page after the section boundary reflowed. Admit type=="break" children in the carrier filter and emit them through TryEmitBreakRun — the same helper the main paragraph run loop uses — so a page / column / line break carried on a section-break paragraph survives.
…atting, not just color/underline/font/size/bold/italic AddHyperlink built the wrapped run's rPr by hand and only covered color, underline, the font slots, size, bold and italic. Every other character property the source set on the link run was silently dropped on dump→batch — most consequentially <w:vanish/>, so a hidden hyperlink (the boilerplate / template-guidance links Word documents bury in vanish text) came back as visible text and shifted the surrounding layout. The per-script bold.cs / italic.cs / size.cs, the run languages, caps, smallCaps, strike, vertAlign, position and friends were lost the same way. Route the remaining character keys through the shared ApplyRunFormatting helper — the same applier the plain-run path uses — after the special color/underline/size/bold/italic/font handling. None of the added keys collide with those slots, so a hyperlink run now preserves its vanish, complex-script weights/size, language and decoration through the round-trip.
…get works A blank officecli xlsx was the only format that did not stamp a theme part — docx and pptx blanks both do, and every real Excel workbook ships one. Without it, setting workbook theme properties (theme.color.accentN, theme.font.major/ minor) silently no-opped: ThemeHandler had no ThemePart to write into, so the keys were reported unsupported and theme-colour lookups came back empty, even though the schema declares the surface settable. Stamp the shared default theme into the WorkbookPart at create time, matching the docx/pptx blanks. workbook theme.* now resolves and round-trips on a freshly created file, closing the cross-format parity gap.
…rtion The DrawingML effectLst child-ordering insert (blur → fillOverlay → glow → innerShdw → outerShdw → prstShdw → reflection → softEdge) was implemented three times: a table-driven copy in Core/DrawingEffectsHelper (run-level rPr), a byte-identical table-driven copy in PowerPointHandler.Effects (shape-level spPr), and a hand-written switch variant in ExcelHandler.Helpers.Drawing. The Core copy's comment even documented the PPT array as a manually kept-in-sync mirror. Promote DrawingEffectsHelper.InsertEffectInSchemaOrder to internal and route all shape-level callers through it; delete the PPT array + InsertEffectInOrder method (13 call sites repointed) and the Excel switch variant (1 call site). The three implementations were behavior-equivalent — same schema order, same empty-list and unknown-type fallback to AppendChild — so this is a pure de-duplication with no runtime change. Effect/schema-order suite (333 tests) green.
…longer gains a spurious checkbox glyph AddFormField unconditionally wrote a ☐ / ☒ glyph as the FORMCHECKBOX field result. Word renders the box from the ffData <w:checkBox>, and many documents leave the field's cached result empty — the dump captures that faithfully as text="". On replay the unconditional glyph turned every such empty checkbox into a literal ☐ run (113 of them in a medical-device questionnaire), so the rebuilt text no longer matched the source and the glyph's font metrics — different from the ffData-rendered box — nudged form/table layout. Only synthesize the default glyph for a typed `add formfield type=checkbox` that supplies no explicit result (no text/value key). When the dump passes an explicit result — the cached glyph when the source stored one, or "" when it did not — honor it. An empty-result checkbox now round-trips empty, a cached or checked checkbox round-trips its glyph, and a fresh typed add still gets a default ☐ / ☒.
…set+get), matching docx The pptx table-cell help schema split a single semantic value — a cell's horizontal span — across two properties: `colspan` (declared read-only/get) and `gridSpan` (declared settable). That violates "one canonical key per value" and diverges from docx, which models it as one `colspan` (set+get, alias `gridspan`). The visible symptom: set a span via `gridSpan` and `get` returned it under `colspan`, so a schema-driven reader looking for `gridSpan` on get found nothing, even though the value round-tripped. The handler was already docx-consistent (accepts `gridspan`/`colspan` on set, emits `colspan` on get) — only the schema was wrong. Merge the two declarations into a single `colspan` (set+get, alias `gridspan`), dropping the standalone `gridSpan`. `rowspan` stays get-only (pptx sets vertical merge via `merge.down`).
… width Chart/shape/picture overlays are absolutely positioned into a per-anchor box on top of the sheet grid. Two issues kept charts from matching Excel: 1. Fill — the card was sized from its width alone (the inner SVG used height:auto), so the plot never grew into the height left below the title, leaving an empty gap under the chart. Make the card a flex column (height:100%) so the plot fills the box; for a left/right legend, shrink the plot viewBox to the real plot area so the meet-fit does not letterbox. 2. Width — the box summed whole spanned columns but dropped the partial-column EMU offset, and EstimateChartSize fell back to 48pt (64px) for default columns instead of the grid's ~44.27pt (59px), clamping the card ~one column narrower than Excel. Feed each chart its EstimateChartSize width (offset included, grid-aligned metric) as the overlay box width; shapes/pictures pass 0 and keep the column-sum. Validation: render a multi-chart .xlsx and compare to Excel — a 480px chart now renders ~444px ending mid-column-N (was clamped a column short), and every gallery chart aligns to its anchor row. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…eight Rows without an explicit height got no height attribute, so the grid let them shrink to content — empty rows rendered ~17px and 11pt data rows ~22px, while Excel renders default rows at 15pt (~20px). That distorts the grid vertically and drifts it out of step with the chart overlay's anchor math, which assumes the sheet default row height. Give every row the explicit-or-default height, and trim the cell's vertical padding (2px -> 1px) so default 11pt rows land on 15pt instead of overgrowing. Rows with an explicit/auto-fit height (rotated text, large fonts) keep their value via the existing RowHeights path. Validation: render any sheet — empty rows are now 20px (were 17px), data rows 20px (were 22px), and an absolutely-positioned chart's anchor row lines up with the matching grid row. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ow low .chart-container carried `margin: 16px auto` from an older in-flow layout. Charts are now absolutely positioned inside a per-anchor box, so that top margin pushed every visible card down ~16px — landing it at the bottom of its anchor row instead of the top — and made it overflow the box bottom. Set margin: 0 so the card sits exactly on its cell anchor. Validation: a chart anchored at row 3 now starts at row 3's top edge (was row 3's bottom) and ends on its bottom anchor row. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1dc5843 to
2bbf8cd
Compare
Contributor
Author
|
Rebased onto current The two conflict hunks sat right next to your recent changes, so I kept yours and layered the chart fixes on top:
Re-verified all four fixes on a 12-chart gallery: charts fill their box at the correct width (~444px), sit exactly on their anchor rows, and grid rows no longer collapse below 15pt. Scatter is already handled by your own fix (cb7548b), which the rebase picked up. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The HTML preview (
view <file> html) renders charts/shapes/pictures as absolutely-positioned overlays on the sheet grid. Comparing the output to Excel surfaced three position/size issues, fixed here as three atomic commits:1. Chart overlays don't fill their anchor box, and render ~a column too narrow.
The card was sized from its width alone (the inner SVG used
height:auto), so the plot never grew into the height left below the title — leaving an empty gap under the chart. And the overlay box width summed whole spanned columns but dropped the partial-column EMU offset, whileEstimateChartSizefell back to 48pt (64px) for default columns instead of the grid's ~44.27pt (59px) — together clamping the card ~one column narrower than Excel. The card is now a flex column (height:100%) that fills the box, and the box is sized from the chart's ownEstimateChartSize(offset included, grid-aligned column metric).2. Default/empty rows collapse below Excel's 15pt height.
Rows without an explicit height got no height attribute, so the grid let them shrink to content — empty rows ~17px and 11pt data rows ~22px, vs Excel's 15pt (~20px). That distorts the grid and drifts it out of step with the overlay anchor math. Every row now gets the explicit-or-default height; cell padding is trimmed (2px→1px) so default 11pt rows land on 15pt. Rows with an explicit/auto-fit height (rotated text, large fonts) keep their value.
3. Stale
.chart-containermargin offsets the overlay a row low..chart-containercarriedmargin: 16px autofrom an older in-flow layout. With charts now absolutely positioned, that top margin pushed each visible card down ~16px (to the bottom of its anchor row) and overflowed the box bottom. Set tomargin: 0.Validation. Rendered a multi-sheet workbook (charts, conditional formatting, sparklines, rich formatting, formulas, pivot) and compared to Excel. A 480px chart now renders ~444px ending mid-column-N (was clamped a column short); every gallery chart sits exactly on its anchor row (top + bottom); default grid rows are 20px (were 17/22px). No regressions across CF (color scales, data bars, icon sets), sparklines, rotation/number-formats/merges, formulas, or the pivot cross-tab.
The three commits are independent and can be split into separate PRs if you prefer one atomic fix per PR — happy to do that.
🤖 Generated with Claude Code